-
Notifications
You must be signed in to change notification settings - Fork 601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Report minimum storage threshold targets #10710
Conversation
223645c
to
91f59f8
Compare
* otherwise, we fall through and evaluate the space wanted metric using | ||
* any configured retention policies _without_ overriding based on cloud | ||
* retention settings. the assumption here is that retention and not | ||
* compaction is what will limit on disk space. this heuristic should be | ||
* updated if we decide to also make "nice to have" estimates based on | ||
* expected compaction ratios. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not following why we're not applying the cloud overrides here. Aren't they honored when applying retention, even on a compacted topic? (i could be wrong, would appreciate a pointer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me explain, and if it makes sense to you, i'll update this comment to clarify the case in the code:
Compacted topics aren't subject to local retention (they always remain whole on local storage), so the only way that retention comes into play for a compacted topic is if the policy is compat,delete
. However, the override_retention_config
helper doesn't take compaction into account. So if we were to apply the cloud overrides here for a compacted topic, then the retention policy we used in calculating "nice to have" wouldn't reflect what would actually happen when housekeeping ran.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand: it wasn't clicking that retention settings applied to a compact,delete
topic followed the overarching retention settings rather than the local settings for tiered storage. If that's the case, I think this policy makes sense.
[](storage::usage acc, storage::usage u) { return acc + u; }); | ||
|
||
// extrapolate out for the missing period of time in the retention period | ||
auto missing_bytes = (usage.total() * missing.value()) / duration.value(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is saying "I expect to see same throughput that ingested the latest segments for the entire remainder of the retention period". This seems like a reasonable, but it makes me wonder about bursty workloads, where a large number of records are written in the span of a few dozens of seconds, and then stopped. In such a case, we could end up with a small duration
and large missing
and end up with a very large value for missing_bytes
.
Maybe it makes sense to compute the expected throughput with now() - start_timestamp
instead of end_timestamp - start_timestamp
, though given these are user-input timestamps, it's probably best to avoid using now()
I suppose regardless it might be better to overestimate the wanted bytes anyway for the goal of avoiding out of space issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I attempted to capture this issue with a heuristic like not making any estimate unless we have say 10 seconds worth of data. But this is going to just be a whack-a-mole game I'm afraid. Perhaps we would do well to do add a cap here, too, such as double or triple the amount of data currently on disk. This may work well since the goal isn't necessarily to observe the system once and know how much space we need. Rather, this will be monitoring periodically. So capping will let the estimate continue to portray growth requirement, while also eliminating huge bursts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, picking something that's good in all cases is hard. I'd lean harder away from this if this significantly impacted workloads (e.g. if we were to base rejection of produce messages on this). But given the ultimate goal here is to just apply retention more aggressively, this is probably fine.
Might be worth calling out in a header comment that this is an estimate and is subject to being way off, so don't use it too aggressively (like rejecting writes based on it)
Force-pushed to resolve merge conflicts in admin_server. |
Force-push:
|
Force-push: fix python formatting |
ping @andrwng when you get a chance |
Force-push: increased fuzz threshold on release it seems we write a bit faster than on debug which I was using for the calibration. |
The state computed and reported by the public disk_usage function is going to grow, and so split this function up so that it doesn't become too unwieldy. Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
When collecting a storage report, we add a new section that descibes target sizes with different meanings. This commit adds a minimum capacity target which is the minimum amount of capacity that log storage needs to be able function at a bare minimum. The minimum is computed as the max segment size (potentially different for each partition) * the minimum reserved number of segments (configurable) * the total number of partitions. Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Useful for making sure we aren't missing any call sites--a possible scenario when using the {.x=, .y=} form of construction. Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Checks in cases of multiple partitions, topics, and settings for both the minimum reserved number of segments and topic segment size settings. Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
It is useful to examine retention configurations without having cloud storage related overrides in the case of estimating the amount of disk capacity needed. Used in later commits. Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Reports how much capacity a partition (or all raft) would like to have. This commit only considers sized based retetion requirements. Time will be added in a later commit. Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Examines recently written data to a partition and tries to extrapolate how much capacity will be needed based on the apparent rate of data production. Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Signed-off-by: Noah Watkins <noahwatkins@gmail.com>
Force-push: fix conflicts with clang-format16 changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the ping, and for clarifying. I think this looks like a good starting point as far as heuristics go. Remaining comments/clarifications aren't blocking.
* otherwise, we fall through and evaluate the space wanted metric using | ||
* any configured retention policies _without_ overriding based on cloud | ||
* retention settings. the assumption here is that retention and not | ||
* compaction is what will limit on disk space. this heuristic should be | ||
* updated if we decide to also make "nice to have" estimates based on | ||
* expected compaction ratios. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand: it wasn't clicking that retention settings applied to a compact,delete
topic followed the overarching retention settings rather than the local settings for tiered storage. If that's the case, I think this policy makes sense.
[](storage::usage acc, storage::usage u) { return acc + u; }); | ||
|
||
// extrapolate out for the missing period of time in the retention period | ||
auto missing_bytes = (usage.total() * missing.value()) / duration.value(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, picking something that's good in all cases is hard. I'd lean harder away from this if this significantly impacted workloads (e.g. if we were to base rejection of produce messages on this). But given the ultimate goal here is to just apply retention more aggressively, this is probably fine.
Might be worth calling out in a header comment that this is an estimate and is subject to being way off, so don't use it too aggressively (like rejecting writes based on it)
Disk space management policy needs to be able to juggle space between the disk cache and log storage. To do this some thresholds will be useful, such as a minimum amount of space that log storage needs for normal functionality, and the minimum amount it needs to meet some basic policies like data retention goals.
This PR adds reporting for
retention.bytes
policiesretention.ms
policies(note that I realize
retention.{bytes,ms}
refers to something different than local retention. in this pr we use the normalized version which for cloud storage will be local retention. i believe i've been using this terminology rather loosely).Backports Required
Release Notes